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ABSTRACT 





The portable diagnosis system — SPD — evaluates the safety and ride quality aspects of the railway vehicles and the technical condition of the rail-vehicle interface. The 
objective of this article is to estimate the nonlinear regression model associated with the ride quality or motion behavior, by applying fuzzy clustering algorithms to the 
geometric data obtained from the technical condition of the railway-vehicle interface and measuring quasi-static lateral acceleration y.,,, in different vehicles. The 
performance will be evaluated by comparing the measured acceleration y.,,, with the acceleration calculated in our model y..,,,, for 15 different vehicles. The obtained 
results will be then compared with the results of the multiple linear regression model used previously for the same purpose [16]. 
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Introduction 

The ride quality of passenger railway vehicles, according to the UIC-518 norm 
from the International Union of Railways, is connected with the value of acceler- 
ation y*gs, this normative says that for ride quality it should have a limit value of 
1,5 m/s’ [7]. Testing and approval of railway vehicles from the point of view of 
their dynamic behavior includes: safety — fatigue of track — ride quality. 


Due to the high cost involved in the measuring of the acceleration y*gs: of each 
vehicle, it is necessary to obtain a tool (model) that allows to predict the behavior 
of the acceleration according to the measured 23 geometric variables which are 
routinely measured in the normal preventive maintenance routine of the railway, 
without the need of performing a y*s: measuring process. Many of the traditional 
method used to solve this problem are based on global models like Polynomials 
(ARMA, ARX, NARMX, NARMAX) [4,6,12], radial basis functions and neural 
network [1,2,3,5], fuzzy clustering [6,13], among others some of them are used 
in similar railway applications in the world [20,21]. 


The fuzzy clustering is to approximate a nonlinear regression problem by 
decomposing into several local linear models; this approach has advantages in 
comparison to global nonlinear models [13,16]. The model structure is easy to 
understand and interpret, both qualitatively and quantitatively. Besides, the 
approach has computational advantages and goes down to straightforward adap- 
tive and learning algorithms. To show the feasibility of the approach, we will 
compare the obtained results using fuzzy clustering with the Babuska toolbox 
[13] with the results obtained with the multiple linear regression model used pre- 
viously for the same purpose [16]. 


This article is part of the development of SPD (Portable Diagnostic System) 
[16,22,23,24, 17,18,19], which consists of the measurement of the vehicle's vari- 
ables allowing the identification of the technical condition for the vehicle- 
railway interface. 


The paper is organized as follows. In section 2, we will introduce the element for 
the regression used in the SPD system; in section 3 we will review the nonlinear 
regression; section 4 will detail the fuzzy clustering methodology; and sections 5 
and 6 will shows the results the comparison with NRL (Multiple nonlinear 
regression) [16] and conclusions respectively. 


Study system 

The Metro system of Medellin was created on the may 1979 by the Medellin 
Municipality and Antioquia Department, allowing the creation of the Metro de 
Medellin Company. 


Description of the railroad (fig.1): 
e LineA: paralell to the Medellin River and with the length of 23.2km, with 19 
stations in North to South direction. 


Line B: it starts from the centre of the city in San Antonio B Station and goes 
westwards. It has the length of nearly 5.6km and has 7 stations. 


Linking Line: it connects the two lines described above and has the length of 
3.2km. 


e Line K: it is a cable transport system that connects the Acevedo Station. It 
consists of 4 stations and has the length of 2.4km. 


Fig.1 . Metro System of Medellin 


In order to extract the data, both estimators given by the UIC 518 standard and the 
geometric variables, the complete railroad of the train is taken and the measuring 
points were classified by sections, just as the standard UIC-518 recommends. 
The three zones proposed by the standard are considered: tangent tracks, large 
radius curve tracks and short radius curve tracks; however, the lengths of sections 
composing different zones were adapted according to the distribution of the Line 
Aroad of the Metro system. The considered lengths were: 


e  tangenttrack: 160m, 
¢ large radius curve track: 70 m, 
e short radius curve track: 70 m. 


Data acquisition 

The Portable Diagnostic System —SPD —is a unique development for railway sys- 
tems which, apart from evaluating safety, ride quality and monitoring the condi- 
tion of geometric parameters of the track-vehicle interface, also allows to carry 
out the multidimensional monitoring of the condition and to determine the fail- 
ures of passenger vehicles of the Metro [16,22,23,24]. 


To develop this diagnosis tool, different methodologies were used, grouping sev- 
eral modern and effective methods in diagnosis tasks, which go from the selec- 
tion of measurement points, through the method of evaluation of the UIC-518 
standard until the utilization of an optimized forecast method [7,16,22,23,24]. 


The system is composed of eight modules: sensors, signal processing, condition 
monitoring, condition testing, incipient failures detection in the wheel-rail inter- 
face, decisions support, forecast and presentation. In the fig. 2, the SPD module 
structure is shown. 
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Fig.2. Module of the SPD 


The signal obtained by the SPD allows calculating the lateral and longitudinal 
forces generated in the vehicle along the track which are necessary for safety eval- 
uation. 


The UIC-518 standard describes the experimental procedures to follow in order 
to carry out the motion tests and the analysis of the results, in terms of quality and 
rolling from the point of view of dynamic behavior in relation to safety, railroad 
wear and motion behavior (ride quality) with the purpose of an approval for the 
international railway traffic. 


Table | presents the different estimators considered by the standard. It was neces- 
sary to acquire acceleration and forces signals in different parts of the train to cal- 


culate the estimators [5]. 


Table 1. Estimators for safety, ride quality, and track fatigue according to 
the UIC-518 Standard. 


Estimator Description Units smut 
Value 


Sum of guiding forces for axle 


SY2m (99,85%) Sum . guiding forces for axle, Percentile kN 66.7 
99.85% 

SY2m (0,15%) punnot guiding forces for axle,Percentile kN 66.7 
0.15% 

Weighted r.m.s of Sum of guiding forces 
por axle. 


Quasi-static force between wheel and rail | m/s’ | 60 
Lateral acceleration in the vehicle body 


Lateral acceleration in the vehicle body. 2 
ok 0 o) 
dee Percentile 99.85% . 


Lateral acceleration in the vehicle body. 2 
Kk 0 9 
ae) Percentile 0.15%. 
sv: Weighted r.m.s of Lateral acceleration in fie 05 
ye" the vehicle body. 


di Quasi-static acceleration in the vehicle 2 
Vertical acceleration in the vehicle body. 
Vertical acceleration in the vehicle body 2 
ok 0 9 
a1 (02,8970) Percentile 99.85%. 
Vertical acceleration in the vehicle body 2 
ook 0 9 
a) Percentile 0.15%. 
Weighted r.m.s of Vertical acceleration 2 
7K 


Because this article is limited to the ride quality evaluation, the estimator to use 
will be the acceleration y,,,. According to the UIC-518 standard [7], the limit 
value of this acceleration of 1,57m/s’, defines the ride quality or motion behavior 
of the vehicle. 





This estimator is obtained from the lateral acceleration signal, taken form the pas- 
senger's box (vehicle). These measurements are filtered by a digital filter, topol- 
ogy Butterworth, order 8 and cutting frequency of 20Hz. 
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Geometric variables 

Among the current maintenance routines of the railway system, different geo- 
metric variables that give an idea of the technical condition of the rail are mea- 
sured. They are presented in Table 2. 


Table 2. State condition of variables. 


Geometric be vaié : Limit 
Equivalent conicity with standard deviation of N/A 
1.25 under the UK method. 


Xl 
X2 Equivalent conicity with standard deviation of | N/A 
2.5 under the UK method. 


Equivalent conicity with standard deviation of el 
3.75 under the UK method. 

Ra [Marimum speed vehicle + kmh | 80 

Re [Cunenads ———SSOSCSC~—S~—~—~‘i mY 


X10 Diferencia de altura entre la cabeza del hilo mm 3 
alto y bajo. 


Gap between the internal rail faces. | mm | 3 
Synthetic coefficient of the railroad quality. | mm | 0 | 


Vertical wear of the head rail for the high rail mm 12 
(east-south) 


X15 Vertical wear of the head rail for the high rail mm 12 
(west-north) 

X16 r.m.s of the corrugation for the high rail for a mm 10 
wave lenght between 30 and 100 mm. 

X17 Excess percentage for the high rail for a wave % 50 
length between 30 and 100 mm. 

X18 r.m.s of the corrugation for the high rail for a mm 20 
wave length between 100 and 300 mm. 

X19 Excess percentage for the high rail for a wave % 50 
length between 100 and 300 mm. 

X20 r.m.s of the corrugation for the low rail for a mm 10 
wave length between 30 and 100 mm. 

X21 Excess percentage for the low rail for a wave % 50 
length between 30 and 100 mm. 

x23 r.m.s of the corrugation for the low rail for a mm 20 
wave length between 100 and 300 mm. 

X24 Excess percentage for the low rail for a wave % 50 
length between 100 and 300 mm. 


Principles of regression 

Generally, fuzzy systems are approximations of functions. Because of this, they 
can also be used in nonlinear regression problems. The nonlinear regression is a 
modeling of static dependence of the response of a variable called regressor, 
where: 





yeEYCR, is a regression vector, x=[x,,x,°--x,] ", over the YCR’ domain. The ele- 
ments of the regression vector can be called regressors and the X domain can be 
called regressor space. The system generated by the data can be described by: 


yxf (x) (1) 


The deterministic function f(-) captures the dependence of y in x, and the symbol 
= reflects the characteristics of that are not exact in function of . The objective of 
the regression is to use the data in order to build a function F(x) as an approxima- 
tion to f(x) not only because of the data, but because of the domain itself. The defi- 
nition of a reasonable approximation depends on the purpose for which the 
model is built. If the objective of the model is to obtain predictions of y, the accu- 
racy must be the most relevant criteria. The accuracy insufficiency is usually 
known as the integral error over the domain. 


I = [| f@) - Fae (2) 


Generally, this error can not be computed, since the value of fis only known with 
the availability of the data. However, the average of the error prediction of the 
available data is often used 
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(3) 


where N is the number of data in the sample. 


Apart from the prediction accuracy, the objective can also be to obtain a model 
which can be used in order to analyze and understand the real properties of the 
data generator system. The potential of fuzzy models is that they describe sys- 
tems as the collection of simple local sub-models expressed by rules. The rules 
can be formulated using a natural language which is more understandable than a 
mathematical language. The rules can also be combinations of analytical models 
commonly used in the control field of engineering, like the local linear models in 
Takagi-Sugeno [12]. 


Nonlinear regression model 
The input of our model are 23 geometric variables of the rail state, and with them, 
the modeled acceleration Y,,,,,is calculated. 


An arrangement is conformed having a line for each of the geometric variables 
measured for each section, and a column for each of the N sections. This arrange- 
ment is called the matrix of observation X (regresor space). 


211 21) eee ZN 
251 Zy eae Z5 NV 

Z=|. : . . (4) 
21 242 2 yN 


Traditionally, the clustering terminology defines the columns of the matrix of 
observation X as characteristics or attributes, while the lines are called patterns 
or objects. 


Fuzzy clustering logics 
It is defined as cluster, the sub-set of data which are more similar between them 
than with other data from another sub-set. 


There are different types of data association or clustering, one of the most popu- 
lar is the “Hard clustering” which refers to grouping data in specific clusters 
mutually exclusive (see fig.3), meaning that the data belongs only to one cluster 
and not to several clusters at the same time. In Figure 3, the data z, could belong to 
both clusters c,and c,, this data is not taken into account when using the Hard clus- 


FSS SESE EEE EEE EEE EE EEE EEE EEE OE SEES EEE EEE EOL D DSS 


° 
4. 
> 
' C1 - : Ce 
® 
“Z 4, 
e@ ° | ° s =. 
4, Z, | /. | Z 4, | : 
@ 
* Z, | 
Fig. 3. Data set 


(Delf center for System, TU Delft, BABUSKA. R) 


It is reasonable to think that on the border of two clusters c, and c, there are some 
points which have a degree of belonging to both clusters. The algorithm c-means 
(Bezdek in Jang, 1997) allows that each point belongs to a cluster with a certain 
degree of belonging, so each point belongs to several clusters. This makes the 
fuzzy clustering, in some real situations, to be more natural than the Hard cluster- 
ing. 


Partition Fuzzy 

The objective of clustering is to divide the data set Z= {z,, Z,,--:, Zy}in c clusters 
(2<c<WN), that partition U=[u,,], where uv, 1s the degree of belonging of 1 point to 
the cluster k. U represents a fuzzy partition if the points meet the following condi- 
tions: 


u,€[0,1]1 <I<c, 1<k<N, (5) 
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= ie l<k<N (6) 


N 
O< u, <N 
2 . l<i<ce, (7) 


Defining the fuzzy partition space as: 


i=l k=) 


= rT cx 
M ,, = iu ER 





e N 
uz €[0.1), Vik; Yiu, =1,Vk0< Yu, < vv} 


Algorithm for Fuzzy C-Means 

There are different algorithms for fuzzy clustering, the most used is the “C- 
Mean” algorithm. This algorithm makes the data partition one can of minimize 
the objective function [6]: 


J(Z:U,V)= oe u,"d2 (8) 
where: * 

Z = {2,,2),°°*Zy} (9) 
is the data set to classify. 

U=(|u,|—¢M i (10) 
is the partition matrix Z. 


V _ [v,.v, vl, V. = sR” (11) 


is the centre vector (clusters) to find. 


(12) 


is the Euclidian norm, distance from the data to the center of the cluster. 








me (1. 0) (13) 
is an exponent that determines the fuzziness of the obtained clusters. 


The steps of the algorithm are: 
¢ toselecta belonging matrix, 


¢ tostart the number of clusters, 
¢  tocalculate the centroid of the clusters, 


N 
m _— 
> U; 2 7 


__ kel 
—— (14) 


m 
2M 


k=1 


¢  tocalculate the Euclidean distance, 


d, =(z, —v,) (z, -v;,) (15) 
¢ toupdate the belonging matrix, 
l 
Mik = a (16) 
y d. (m-1) 
j=l dy 


The equation (14) gives the value v, which is the weighted average of the data 
belonging to a cluster, where the weights are the belonging functions. This algo- 
rithm presents the following disadvantages: 

e the final results depend on the final partition, 

¢ the number of clusters is defined at the beginning of the algorithm, 

e the Euclidean distance method allows detecting only spherical clusters. 

This very last disadvantage is a drawback because the most ideal shape of data 
grouping is given by an ellipse (fig.4), so the most appropriate algorithm is one 


called ““Gustafson-kessel” because this one looks for hyper ellipsoids clusters, 
which very well detect the quasi-linear behavior of data. 
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Fig. 4. Clusters of different shape 
(Delft centre for System, TU Delft, BABUSKA R) 


Gustafson-Kessel (GK) algorithm 

This algorithm is found among the adaptive distance algorithms. This one 
extends the fuzzy c-means by choosing a different norm B, for each cluster 
instead of keeping it constant. 


divs =(z, -v,) B,(z, —v,) (17) 


where: B, are the possible optimization matrixes of the objective function, and 
correspond to the covariance of each cluster. 


Then, the objective function is defined as follows: 
c 
k ») 
J(Z:U.V)=> >) Uadas (18) 
i=1 


In order to obtain a viable solution, B, must be somehow limited. In this case, we 
will keep its volume constant by fixing the determinant of B,. 


1 
B, =|, det(F,)P Fy" (19) 
where: F’, is the covariance matrix for each cluster. 


The GK algorithm fits the purpose of identification because it has the following 
characteristics: 


e the cluster dimension becomes limited by measuring the distance and by the 
definition of the clusters prototype as a point; 


¢ in comparison to other algorithms, GK is relatively insensitive to the initial- 
ization of the partition matrix. 


Once we have the groups of data, the next step is to derive the interference rules 
which identify a fuzzy model. To achieve that, there are different types like: 


¢ Mandami: fuzzy rules with fuzzy antecedents and fuzzy consequents. 


¢ Takagi-Sugeno (TS): fuzzy rules with fuzzy antecedents and consequents 
that could be expressed in a simple way like the first order linear model [12]. 


Because the TS fuzzy model is an effective tool for the approximation of nonlin- 
ear systems based on the information of inputs and outputs through the interpola- 
tion of local linear models, which for this case are determined by the cluster, we 
use this TS model in the solution of the identification of the model we are looking 
for. 


The solution consists of projecting the belonging of the obtained cluster in the 
desired space (fig.5), thus obtaining belonging functions from the cluster. 


°o 
= 2 
o * oo 
co v, 


o a 


projection o» 


Fig. 5. Extraction of rules by fuzzy clustering 
(Delft centre for System, TU Delft, BABUSKA R) 


114 





E-ISSN No : 2454-9916 | Volume: 2 | Issue: 12 | Dec 2016 


Takagi-Sugeno Model 
In the Takagi-Sugeno model, the consequent rules are function of the inputs: 


R,:Ifxis A, Then, y= f(x), i= 1,2... K. (20) 
where: xe is the input variable (antecedent),A, is a multidimensional fuzzy set 
(cluster), y, is the output variable (consequent), R, is the its rule and K is the num- 
ber ofrules of the rules set. 
The consequent function can be linearly expressed as: 
T 

2 ixt+ b. (21) 
Substituting (21) in (20) we get: 
R,ifxisiA,then y,=a'x+b, (22) 


Given the outputs of the individual consequents y, , the global output and the 
Takagi-Sugeno model is calculated by: 


_ LA ey, 
> 4) 


where: B, is the commitment degree of the antecedent of the its rule, calculated as 
the belonging degree of x in the interior of the A, cluster: 


(23) 


B;(@) =H) (24) 
normalizing, 
kj (25) 


py w(x) 


so the TS model could be interpreted as a quasi-linear model with dependence on 
the input x parameter. 


7 >. h, (x)-(a? , x+b,) (26) 


Fig. 6 and 7 shows an example ofa function y =/(x), represented by four TS rules. 


y Cluster 4 





Takagi-Sugeno model 


Rule-based description: 
Ifx is A, then y = a.x +b, 


Ifx is A, then y = a,x +b, 


etc... 





Fig. 6. Takagi-Sujeno fuzzy clustering 
(Delft centre for System, TU Delft, BABUSKA R) 















Fig.7. GK y TS fuzzy clustering 
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The antecedent of each rule defines a valid zone (fuzzy) for the correspondent lin- 
ear model of the consequent. The global output function is calculated through 
weighting the local linear models. 


Numerical result 

During this work, the toolbox developed by Babuska [13] from the Delft Centre 
for System and Control was used. This tool was developed to be worked on 
MATLAB. Once the information is collected from the matrix of observations_X, 
itis taken to pre-processing that consists in centering the matrix (fig.8). 


t¢rmino epere 





| 
lK 7 
. “x -¥ Y,—-¥ 
aimed | 7 mi aa 2 } —_iK K 
il ' be 7, , iK 
| Ta 5 j wa YK 
Zs ow Z — Z 


Fig.8. Normalization of a matrix 


Where: * isthe measure of each variable 
Sis the standard deviation. 


The quality of the model is evaluated by calculating the average error, its equiva- 
lent in the used toolbox corresponds to the percentile variance accounted (VAF) 
[13], between the real and the estimated data. 


This coefficient is obtained between two signals: 


var( yl — = on 


VAF =100%-|1- 
var(y1) 


where the value of VAF will be 100% if both signals are equal. If the values are 
quite different, the value of VAF will tend to zero. 


For each vehicle, the following procedure was followed: 
¢ the matrix 1s normalized; 


¢ the matrix is divided into two sections, one for the identification of the model 
and the other to carry out the verification of the obtained model; 


e the real acceleration with the obtained model by multiple linear regression 
was plotted [14,15]. This was also used to determine the ride quality model 
[16], therefore it will be a reference to validate our results and the obtained 
result by the fuzzy clustering method; 


¢ the VAF coefficient is calculated to determine the accuracy of the model com- 
pared with the real data. 


The best results were obtained using the toolbox with the following parameters: 


FM.c =12; %number ofclusters 
FM.m =2.8; % fuzziness parameter 
FM.tol =0.1; %termination criterion 
FM.ante =2; % 2-projected MFS 
FM.cons =2; % 2-weighted LS 


where FM 1s the defined structure by Babuska in MATLAB for the parameters of 
the toolbox. 


In Table 3 present the obtained results with the fuzzy nonlinear regression model 
and the obtained results with the multiple regression model [16]. 


Table 3. Results 
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Fig.9 shows the measured data of the acceleration and the model output data 
obtained by fuzzy clustering for the vehicle 05. 


vs* 0.9) 
0.8} 
0.7 | 
0.6} 
0.5} 
0.4} 
0.3) 


0.2} 


Fig.9. Plot Yqst vs. output fuzzy model vehicle 05 


It is noticeable that in the table of results there is a VAF of 100%, which corre- 
sponds to the line at 45° of the figure and besides faithfully produced accelera- 
tion, as shown in the fig. 10. 


0.9, ' 
Real data 


General model Fuzzy clustering 
0.8} General model Regresion linear _| 
0.7 | 
0.6} 
0.5| } 
y" 
qst j , 
0.4 
A 
, 
0.3 
] ' ' | ' | I ‘ { 
0.2} 1 tp ' yy fF ;, iy 
} Y , 1 } ' V i 
J ; i 
ij J 
0.1} ! j ; / | 
Oo i Lt | | i \ " j j } 
8) 5 10 15 20 25 30 35 40 45 50 


Tangent track 


Fig.10 . Real acceleration curve vs. curve vehicle 05 
In Table 3, it is observed that the worst VAF coefficient corresponds to the vehi- 
cle 40, with a VAF of 92.28. If we plot the data (fig.11), it can be clearly observed 


which model did not estimate the data well, leaving it out of the comparative 
graphic of 45° (fig.12). 
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Fig. 11. Plot Yqst vs. fuzzy model output Vehicle 40 
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This could be due to tuning problems of the model, either in the sensors installa- 
tion, different geometric conditions of the rail or the equipment capacity, etc. 
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Fig.12. Real acceleration curve vs. vehicle 40 


To obtain a general model, all the samples of the 15 vehicles in the matrix were 
taken, and then a pre-processing consisting of interchanging the files randomly 
was performed. Afterwards, the same process on each vehicle individually was 
performed and the results were a VAF of 97.35. It can be graphically observed in 
figures 13 and 14. 
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Fig.14. Real acceleration general model curve 


Conclusions 

In this article, the main Fuzzy clustering aspects for the model identification 
were revised. Although the obtained results with the linear multiple regression 
are satisfactory, comparing the obtained results, we find that the quality of the 
fuzzy model is better in 14 out of 15 analyzed vehicles, and only one vehicle of 
the model of linear multiple regression is better with the fuzzy model. 


We showed that Fuzzy clustering is a good tool to approximate nonlinear func- 
tions, especially the Takagi-Sugeno model. 


This regression model can be integrated into the process for decision support in 


the maintenance of rail-vehicle interface to reduce the cost associated with the 
maintenance work, human resources and the increase of system reliability. 
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Due to the reasons explained before, when it comes to identifying a nonlinear 
model, we recommend the fuzzy model to be used in future implementations 
among the SPD. 
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